How could POSMOs Transport Mode Detection be improved further using publicly available data
Author
Lukas Bieri (bieriluk) & Valentin Hett (hettval1)
Published
June 30, 2023
Abstract
Computational Movement Analysis is a widely researched field that aims to analyse and validate trajectory data to identify correlations, patterns and outliers in movement. GPS point data form an elementary baseline for the analysis of movement patterns in various applications. This project focuses on Transport Mode Detection (TMD) in computational movement analysis using GPS tracking data from the POSMO app. The project uses the POSMO data and implements algorithms in R for data processing, analysis and visualisation using publicly available context data such as public transport networks. The results demonstrate the effectiveness of multi-criteria analysis for TMD even with limited optimisation of the underlying algorithms. The project, while having some limitations in the implementation, presents ideas for further improvements in TMD from POSMO data, including the addition of height modelling, accelerometer data and supervised learning algorithms.
Introduction
Show the code
## 1. Information### 1.1 Project Info#Module: Patterns and Trends HS22#Course: Semester Project#Lecturer: Prof. Dr. Patrick Laube#Assistent Lecturers: Nils Ratnaweera & Dominic Lüönd#Autors: Valentin Hett (hettval1) & Lukas Bieri (bieriluk)#Date: 30.06.2023#Info: Most visualizations have been commented out due buffer overflow, with the expection of the figues in this report.
Computational Movement Analysis is a widely researched field that uses algorithms and visual techniques to analyze and validate trajectory data to detect relationships, patterns and outliers in movement. However, current visualization systems predominantly target multilevel applications and macro-level results. A key component of mobility data processing is map matching, which involves matching GPS points to corresponding road network links to create accurate vehicle trajectories (Cai et al., 2018). GPS tracking is becoming more and more important in Movement Analysis. Platforms like Google Maps and other portals uses GPS to track routes, locations and other records. One major limitation of GPS however, is that it only can record positions and cannot provide context or semantics (Van der Spek et al., 2009). Even apps with support functions, where people are asked to fill in a movement protocol, this data is often incomplete due to laziness or forgotten memories (Sadeghian et al., 2022).
The tracking app POSMO is a mobility data platform for capturing and managing all types of traffic and analyzing and visualizing how people use space. Students at ZHAW in the module “patterns and trends in environmental data” tracked their movement for several month using POSMO and analyzed their data using Computational Movement Analysis in R. These assignments relate primarily to the semester project, which involves spatial data analysis. POSMO has already implemented algorithms to determines the transport modes (TM) for recorded trajectories and reproduces them as routes. Like any GPS recording, there is noise and variability, which can lead to incorrect conclusions. Accurate Transport Mode Detection (TMD) is necessary for many movement analysis tasks, for example, for health assesments from running or cycling, various applications use GPS recordings. Accurate TMD can also be used to improve public transport planning or to compute the most efficient and fuel-saving routes by car. The introduction of context maps into a system can already cause an improvement. There are many different approaches to TMD to be found in literature review. (Sadeghian et al., 2022) showed that accurate TMD was possible using a combination of unsupervised and supervised leaning algorithms with a spatial multi criteria analysis.
For this project, we set out to answer the following research questions:
Can Transport Mode Detection (TMD) for the POSMO tracking data be improved using (a) the stepwise procedure described in (Sadeghian et al., 2022) and with (b) public transport data?
Where do we see potential for improvement with POSMOS TMD from our improvement trials and literature?
Considering the constrained resources, including time and computational power, available for this semester project, the primary goal is not to completely revolutionize transportation mode detection (TMD) using GPS data. Instead, the project goals are to explore different approaches from the existing literature by implementing them to the best of our abilities and from this experience brainstorm potential ways to improve TMD from POSMO data. Additionally, it is assumed that not all these approaches can be implemented fully in the form of algorithms within the framework of this project.
Show the code
### 1.2 Software used#R version 4.2.1 (2022-06-23 ucrt) -- "Funny-Looking Kid" Copyright (C) 2022 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)#RStudio 2023.06.0+421 "Mountain Hydrangea" Release (583b465ecc45e60ee9de085148cd2f9741cc5214, 2023-06-05) for windows Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) RStudio/2023.06.0+421 Chrome/110.0.5481.208 Electron/23.3.0 Safari/537.36
Material and Methods
The POSMO app saves tracking data in its online datamap tool, where it can be extracted. The App already assigns a transportation mode incl. train, car, bus, walking and even airplane (Genossenschaft Posmo Schweiz, n.d.). For this research project, only the transport mode walking (incl. running), biking, train (incl. gondolas & cable cars) and buses (incl. trams) are considered and compared. Ships and aerial vehicles were not included. To be able to compute the data and based on the limited availability of transport network data, the analysis was limited to the canton of Zurich. For the different TMD improvement approaches we mostly followed the method set out by (Sadeghian et al., 2022)with a focus on multi criteria analysis.
All data processing, analysis and visualization is done in R (4.2.1/2022-06-23) using RStudio (2023.06.0+421) and the packages: “ggplot2”, “dplyr”, “tidyr”, “readr”, “zoo”, “data.table”, “sf”, “terra”, “tmap”, “stats”, “randomForest”, “lubridate”, “trajr”, “gstat”, “geosphere”, “nngeo”, “vegan”, “hms”, “tibble”, “useful”, “DescTools”, “utils” and “janitor”. Last update of these packages was on the 23.06.2023.
Show the code
# Install (if necessary) and load all necessary packages with this functionipak <-function(pkg){ new.pkg <- pkg[!(pkg %in%installed.packages()[, "Package"])]if (length(new.pkg)) install.packages(new.pkg, repos ="http://cran.us.r-project.org", dependencies =TRUE)sapply(pkg, require, character.only =TRUE)}packages <-c("ggplot2", "dplyr", "tidyr", "readr", "zoo", "data.table", "sf", "terra", "tmap", "stats", "randomForest", "lubridate", "trajr", "gstat", "geosphere", "nngeo", "vegan", "hms", "tibble", "useful", "DescTools", "utils", "janitor")ipak(packages)#Set the tMap mode to "view"tmap_mode(mode ="view")
All necessary data sets are loaded into R, reprojected (where necessary) and filtered. An exploratory data analysis (EDA) is done for the POSMO data to determine the appropriate settings for data cleaning and outlier removal.
Show the code
## 3. Preprocessing### 3.1 Import, check and transport data#### 3.1.1 Boundries# Import Boundries data setst_layers("datasets/swissTLMRegio_BOUNDARIES_LV95.gdb")kanton_zh <-st_read("datasets/swissTLMRegio_BOUNDARIES_LV95.gdb", layer ="TLMRegio_KANTONSGEBIET")kanton_zh <- kanton_zh |>filter(NAME =="Zürich")#Check if the coordinate system is correctly assignedst_crs(kanton_zh)#Visualize to verify# tm_shape(kanton_zh) +# tm_polygons() +# tm_basemap("Esri.WorldImagery")#### 3.1.2 Posmo data#Import raw unverified setposmo <-read_delim("datasets/posmo_2023-01-01T00-00-00_2023-06-16T23-59-59_unvalidated_def.csv", delim =",")#Import the manually verified data set (in the POSMO datamap online tool)posmo_valid <-read_delim("datasets/posmo_2023-01-01T00-00-00_2023-06-16T23-59-59_validated_def.csv", delim =",")#Check if the import got the Time Zone for the POSIXct colum correctstr(posmo)str(posmo_valid)Sys.time()#Store your data frame as a spatial data frame and transform the coordinate system from WGS84 (i.e. EPSG 4326) to CH1903+ LV95 (EPSG 2056) and filter it to the canton of Zurich (intersect)posmo <-st_as_sf(posmo, coords =c("lon_x","lat_y"), crs =4326) |>st_transform(2056) |>st_filter(kanton_zh, .pred = st_intersects)#Same for the validated dataposmo_valid <-st_as_sf(posmo_valid, coords =c("lon_x","lat_y"), crs =4326) |>st_transform(2056) |>st_filter(kanton_zh, .pred = st_intersects)#Check the results in a table and by visualizationhead(posmo)head(posmo_valid)# tm_shape(posmo) +# tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")# tm_shape(posmo_valid) +# tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")#Extract the coordinates into separate colums to use them for euclidean distance calculationposmo_coordinates <-st_coordinates(posmo)posmo <-cbind(posmo, posmo_coordinates)#Same for the validated dataposmo_valid_coordinates <-st_coordinates(posmo_valid)posmo_valid <-cbind(posmo_valid, posmo_valid_coordinates)#### 3.1.3 Railway routes data#Check the layer of the gdbst_layers("datasets/swissTLMRegio_Produkt_LV95.gdb")#Import the Railway Layertrain_routes <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Railway")#Check if the coordinate system is correctly assignedst_crs(train_routes)#Filter for "Normalspurbahn", "Schmalspurbahn", "Standseilbahn", "Seilbahn", "Gondelbahn", "Sessellift" and "Autoverlad", exclude "Güterbahn", "Museumsbahn", "Bahn ausser Betrieb", "Bahn im Bau" and limit it to the canton of zurich (intercest)train_routes <- train_routes |>filter(OBJVAL !=3& UNDERCONST ==0) |>st_filter(kanton_zh, .pred = st_intersects)#add train stopstrain_stops <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Terminal")train_stops <- train_stops |>filter(OBJVAL ==1) |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(train_routes) +# tm_lines(col = "red") +# tm_shape(train_stops) + # tm_dots(col = "red") +# tm_basemap("Esri.WorldImagery")#### 3.1.4 Bus & tram data#Check the layer of the gpkgst_layers("datasets/Linien_des_offentlichen_Verkehrs_-OGD.gpkg")#Import the layer with all the public transport lines (filterd at download to exclude railway "S-Bahn")bus_routes <-st_read("datasets/Linien_des_offentlichen_Verkehrs_-OGD.gpkg", layer ="ZVV_LINIEN_L")#Check if the coordinate system is correctly assignedst_crs(bus_routes)#Filter to the canton of zurich (intersect). This does not exclude segments that start in the canton and leave it, but that doesn't seem to be an issue for public transport as the canton boder is arbitrarily set system boundrybus_routes <- bus_routes |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(bus_routes)+# tm_lines()+# tm_basemap("Esri.WorldImagery")#### 3.1.5 Road network data#Check the layer of the gdbst_layers("datasets/swissTLMRegio_Produkt_LV95.gdb")#Import the layer with all major roads and filter it for the canton of zurichroads <-st_read("datasets/swissTLMRegio_Produkt_LV95.gdb", layer ="TLMRegio_Road")roads <- roads |>st_filter(kanton_zh, .pred = st_intersects)#Visualize to verify# tm_shape(roads) +# tm_lines(col = "red") +# tm_basemap("Esri.WorldImagery")### 3.2 Getting an overview & EDA#### 3.2.1 For how long were the individual tracked? Are there gaps? Were all individuals tracked concurrently or sequentially?#Check the posmo data by inspecting it in detailhead(posmo)tail(posmo)head(posmo_valid)tail(posmo_valid)class(posmo$datetime)class(posmo_valid$datetime)tz(posmo$datetime)tz(posmo_valid$datetime)#### 3.2.2 How many individuals were tracked # Make sure all data from one individual posmo$user_id |>unique()#### 3.2.3 List of all transport modes#Create a list of all transport modes in the POSMO data incl. numerical codes fotr the TMsunique(posmo$transport_mode)unique(posmo_valid$transport_mode)numbers <-c(0, 1, 2, 3, 4, 5, 6, 8)names <-c("Unkonwn", "Walk", "Car", "Bus", "Train", "Bike", "Tram", "Other")transport_mode <-c(NA, "Walk", "Car", "Bus", "Train", "Bike", "Tram", "Other1")all_transport_modes <-data.frame(numbers, names, transport_mode)all_transport_modes#Join the Transport Modes with the raw data to have the numerical codes for TM in the data framesposmo <- posmo |>left_join(all_transport_modes, by ="transport_mode") |>rename(tm_unval = numbers)posmo_valid <- posmo_valid |>left_join(all_transport_modes, by ="transport_mode") |>rename(tm_val = numbers)
The Public Transport data is sourced from Open Data Platforms and governmental GIS data bases. We used railway and boundary data from swissTLMRegio (swisstopo, 2022a, 2022b) and bus data from the Zurich Transport Network (ZVV) (Verkehrsbetriebe Zürich VBZ, 2022). The POSMO data, is tracking data from one student in the module who provided his/her tracking data voluntarily to the class. For this tracking data set, we also have a validated data set with ground truth transport mode available. This data set was manually validated for TM by memory using the POSMO datamap online tool. The segmentation of trajectories was not changed for validation from the POSMO segmentation due to the high workload of this procedure.
Show the code
#Visualize the used kontext data into one figurefigure_1 <-tm_shape(kanton_zh) +tm_borders(col ="red",lwd =3) +tm_shape(train_routes) +tm_lines(col ="green") +tm_shape(roads) +tm_lines(col ="black")+tm_shape(bus_routes) +tm_lines(col ="blue")+tm_add_legend(type ="fill", labels =c("Canton Zurich", "Railway lines", "Major Roads", "Bus/Tram routes"),col =c("red", "green", "black", "blue"),border.lwd =0.5,title ="Data used for MCA")figure_1